309 research outputs found

    A Conversation with Alan Gelfand

    Get PDF
    Alan E. Gelfand was born April 17, 1945, in the Bronx, New York. He attended public grade schools and did his undergraduate work at what was then called City College of New York (CCNY, now CUNY), excelling at mathematics. He then surprised and saddened his mother by going all the way across the country to Stanford to graduate school, where he completed his dissertation in 1969 under the direction of Professor Herbert Solomon, making him an academic grandson of Herman Rubin and Harold Hotelling. Alan then accepted a faculty position at the University of Connecticut (UConn) where he was promoted to tenured associate professor in 1975 and to full professor in 1980. A few years later he became interested in decision theory, then empirical Bayes, which eventually led to the publication of Gelfand and Smith [J. Amer. Statist. Assoc. 85 (1990) 398-409], the paper that introduced the Gibbs sampler to most statisticians and revolutionized Bayesian computing. In the mid-1990s, Alan's interests turned strongly to spatial statistics, leading to fundamental contributions in spatially-varying coefficient models, coregionalization, and spatial boundary analysis (wombling). He spent 33 years on the faculty at UConn, retiring in 2002 to become the James B. Duke Professor of Statistics and Decision Sciences at Duke University, serving as chair from 2007-2012. At Duke, he has continued his work in spatial methodology while increasing his impact in the environmental sciences. To date, he has published over 260 papers and 6 books; he has also supervised 36 Ph.D. dissertations and 10 postdocs. This interview was done just prior to a conference of his family, academic descendants, and colleagues to celebrate his 70th birthday and his contributions to statistics which took place on April 19-22, 2015 at Duke University.Comment: Published at http://dx.doi.org/10.1214/15-STS521 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Centered Partition Process: Informative Priors for Clustering

    Full text link
    There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects

    Spatial predictions on physically constrained domains: Applications to Arctic sea salinity data

    Full text link
    In this paper, we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on areas of water mistakenly flagged as ice by satellite algorithms. To remove bias in the retrieval of salinity near sea ice, the algorithms use conservative ice masks, which result in considerable loss of data. We aim to produce realistic SSS values for such regions to obtain more complete understanding about the SSS surface over the Arctic Ocean and benefit future applications that may require SSS measurements near edges of sea ice or coasts. We propose a class of scalable nonstationary processes that can handle large data from satellite products and complex geometries of the Arctic Ocean. Barrier Overlap-Removal Acyclic directed graph GP (BORA-GP) constructs sparse directed acyclic graphs (DAGs) with neighbors conforming to barriers and boundaries, enabling characterization of dependence in constrained domains. The BORA-GP models produce more sensible SSS values in regions without satellite measurements and show improved performance in various constrained domains in simulation studies compared to state-of-the-art alternatives. The R package is available on https://github.com/jinbora0720/boraGP

    Nonparametric Bayes Shrinkage for Assessing Exposures to Mixtures Subject to Limits of Detection

    Get PDF
    Assessing potential associations between exposures to complex mixtures and health outcomes may be complicated by a lack of knowledge of causal components of the mixture, highly correlated mixture components, potential synergistic effects of mixture components, and difficulties in measurement. We extend recently proposed nonparametric Bayes shrinkage priors for model selection to investigations of complex mixtures by developing a formal hierarchical modeling framework to allow different degrees of shrinkage for main effects and interactions and to handle truncation of exposures at a limit of detection. The methods are used to shed light on data from a study of endometriosis and exposure to environmental polychlorinated biphenyl congeners

    mpower: An R Package for Power Analysis via Simulation for Correlated Data

    Full text link
    Estimating sample size and statistical power is an essential part of a good study design. This R package allows users to conduct power analysis based on Monte Carlo simulations in settings in which consideration of the correlations between predictors is important. It runs power analyses given a data generative model and an inference model. It can set up a data generative model that preserves dependence structures among variables given existing data (continuous, binary, or ordinal) or high-level descriptions of the associations. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples focusing on applications for environmental mixture studies when predictors tend to be moderately to highly correlated. It easily interfaces with several existing and newly developed analysis strategies for assessing associations between exposures and health outcomes. However, the package is sufficiently general to facilitate power simulations in a wide variety of settings

    Accelerometry-Assessed Latent Class Patterns of Physical Activity and Sedentary Behavior With Mortality

    Get PDF
    Latent class analysis provides a method for understanding patterns of physical activity and sedentary behavior. This study explored the association of accelerometer-assessed patterns of physical activity/sedentary behavior with all-cause mortality

    Bayesian Functional Principal Component Analysis using Relaxed Mutually Orthogonal Processes

    Full text link
    Functional Principal Component Analysis (FPCA) is a prominent tool to characterize variability and reduce dimension of longitudinal and functional datasets. Bayesian implementations of FPCA are advantageous because of their ability to propagate uncertainty in subsequent modeling. To ease computation, many modeling approaches rely on the restrictive assumption that functional principal components can be represented through a pre-specified basis. Under this assumption, inference is sensitive to the basis, and misspecification can lead to erroneous results. Alternatively, we develop a flexible Bayesian FPCA model using Relaxed Mutually Orthogonal (ReMO) processes. We define ReMO processes to enforce mutual orthogonality between principal components to ensure identifiability of model parameters. The joint distribution of ReMO processes is governed by a penalty parameter that determines the degree to which the processes are mutually orthogonal and is related to ease of posterior computation. In comparison to other methods, FPCA using ReMO processes provides a more flexible, computationally convenient approach that facilitates accurate propagation of uncertainty. We demonstrate our proposed model using extensive simulation experiments and in an application to study the effects of breastfeeding status, illness, and demographic factors on weight dynamics in early childhood. Code is available on GitHub at https://github.com/jamesmatuk/ReMO-FPC

    Identifiable and interpretable nonparametric factor analysis

    Full text link
    Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity, models that include factors as predictors in flexible multivariate regression are popular, with GP-LVMs using Gaussian process (GP) priors for the regression function and VAEs using deep neural networks. Unfortunately, such approaches lack identifiability and interpretability and tend to produce brittle and non-reproducible results. To address these problems by simplifying the nonparametric factor model while maintaining flexibility, we propose the NIFTY framework, which parsimoniously transforms uniform latent variables using one-dimensional nonlinear mappings and then applies a linear generative model. The induced multivariate distribution falls into a flexible class while maintaining simple computation and interpretation. We prove that this model is identifiable and empirically study NIFTY using simulated data, observing good performance in density estimation and data visualization. We then apply NIFTY to bird song data in an environmental monitoring application.Comment: 50 pages, 17 figure

    Bayesian joint modeling of chemical structure and dose response curves

    Full text link
    Today there are approximately 85,000 chemicals regulated under the Toxic Substances Control Act, with around 2,000 new chemicals introduced each year. It is impossible to screen all of these chemicals for potential toxic effects either via full organism in vivo studies or in vitro high-throughput screening (HTS) programs. Toxicologists face the challenge of choosing which chemicals to screen, and predicting the toxicity of as-yet-unscreened chemicals. Our goal is to describe how variation in chemical structure relates to variation in toxicological response to enable in silico toxicity characterization designed to meet both of these challenges. With our Bayesian partially Supervised Sparse and Smooth Factor Analysis (BS3FA) model, we learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Our model also enables the prediction of chemical dose-response profiles based on chemical structure (that is, without in vivo or in vitro testing) by taking advantage of a large database of chemicals that have already been tested for toxicity in HTS programs. We show superior simulation performance in distance learning and modest to large gains in predictive ability compared to existing methods. Results from the high-throughput screening data application elucidate the relationship between chemical structure and a toxicity-relevant high-throughput assay. An R package for BS3FA is available online at https://github.com/kelrenmor/bs3fa
    • …
    corecore